Multimodal data integration for enhanced longitudinal prediction for cardiac and cerebrovascular events following initial diagnosis of obstructive sleep apnea syndrome

Background Obstructive sleep apnea syndrome (OSAS), a prevalent condition, often coexists with intricate metabolic issues and is frequently associated with negative cardiovascular outcomes. We developed a longitudinal prediction model integrating multimodal data for cardiovascular risk stratification of patients with an initial diagnosis of OSAS. Methods We reviewed the data of patients with new-onset OSAS who underwent diagnostic polysomnography between 2018–19. Patients were treated using standard treatment regimens according to clinical practice guidelines. Results Over a median follow-up of 32 months, 98/729 participants (13.4%) experienced our composite outcome. At a ratio of 7:3, cases were randomly divided into development (n = 510) and validation (n = 219) cohorts. A prediction nomogram was created using six clinical factors – sex, age, diabetes mellitus, history of coronary artery disease, triglyceride-glucose index, and apnea-hypopnea index. The prediction nomogram showed excellent discriminatory power, based on Harrell’s C-index values of 0.826 (95% confidence interval (CI) = 0.779–0.873) for the development cohort and 0.877 (95% CI = 0.824–0.93) for the validation cohort. Moreover, comparing the predicted and observed major adverse cardiac and cerebrovascular events in both development and validation cohorts indicated that the prediction nomogram was well-calibrated. Decision curve analysis demonstrated the good clinical applicability of the prediction nomogram. Conclusions Our findings demonstrated the construction of an innovative visualisation tool that utilises various types of data to predict poor outcomes in Chinese patients diagnosed with OSAS, providing accurate and personalised therapy. Registration Chinese Clinical Trial Registry ChiCTR2300075727.

Obstructive sleep apnea syndrome (OSAS) is a serious health condition that can negatively impact individuals' overall well-being, especially accelerated atherosclerosis and increased cardiovascular disease risk [1][2][3].Early detection and intervention of OSAS have been shown to significantly improve clinical outcomes and enhance sleep-related quality of life [1][2][3].Thus, the promotion of public awareness regarding OSAS and the evaluation of individuals' ability to assess the risk of adverse events in OSAS patients are crucial for encouraging appropriate medical attention-seeking behaviour, thereby preventing or managing the associated detrimental consequences.Notably, significant efforts are being made in well-resourced settings to diagnose and treat individuals with OSAS.However, the available data suggest that most cases of OSAS remain undiagnosed and untreated [1][2][3][4].In China, there is generally a lack of awareness regarding OSAS, and diagnostic and treatment options are often unavailable or not adapted for resource-poor settings [1].Multiple studies have demonstrated that continuous positive airway pressure (CPAP) treatment for moderate-to-severe OSAS with coexisting cardiovascular disease offers little benefit in terms of cardiovascular risk reduction [5,6].Therefore, strategies for early prediction of OSAS and risk stratification are needed to address this health problem and prevent irreversible harm to individuals and society.
OSAS advances through an intricate and evolving process that can result in serious complications and is associated with various immunometabolic disorders, increasing the risk of cardiovascular events [1][2][3][4]7].Accordingly, a one-size-fits-all approach to the diagnosis and management of OSAS does not reflect the complexity of the disease.It cannot quantify the severity or risks of cardiovascular events for individual cases.Thus, a growing body of research aims to identify reliable markers for the severity of OSAS and the risks of adverse outcomes [7][8][9][10][11].The available data supports a transition from focusing on multiple dimensions of a multifactorial response network to understanding OSAS as a metabolic disorder affected by various systemic factors [11].The availability of novel statistical methods has allowed the construction of visual risk stratification tools incorporating multimodal data sets, with a putative predictive factor in each set [12][13][14][15][16]. Data from a decade-long historical cohort study of 10 149 patients diagnosed with or referred for OSAS found that patients' demographic and clinical features combined with physiologic indices provided better prognostic value for predicting cardiovascular events and all-cause mortality.However, this study did not include validation cohorts or establish the clinical applicability of their prognostic model [15].Additionally, the previous model did not consider inflammatory, metabolic, or other important categories of biomarkers.
To the best of our knowledge, no studies have investigated the value of multimodal data (clinical examination data, biomarker, and sleep monitoring data) integration for longitudinal prediction of the initial diagnosis of OSAS in a Chinese population.The purpose of the present study was to develop a predictive model for prognosis in patients with a new diagnosis of OSAS who were treated using standard treatment regimens among a Chinese population based on multimodal data of factors that influence outcomes in these patients.

Study design and participants
Our study used data from a cohort study (OSAS patients, n = 729) from January 2018 to December 2019.We included the data from all consecutive patients with OSAS who underwent a successful polysomnography examination for initial diagnosis at the Xinjiang Medical University Affiliated Hospital of Traditional Chinese Medicine.The study time frame from 2018-19 represents the most up-to-date data available at the study initiation, ensuring that the findings accurately reflect current clinical practices and patient characteristics.Moreover, the chosen timeframe aligns with the publication of significant guidelines and consensus statements on the diagnosis and management of OSAS, including the 2017 American Academy of Sleep Medicine clinical practice guidelines [17].These guidelines provided updated recommendations for diagnostic criteria, treatment protocols, and follow-up strategies, potentially influencing clinical decision-making and patient outcomes during the study period [17].By focusing on the years 2018-19, our objective was to examine the impact of these recent guidelines on real-world clinical practice and ensure the relevance of our findings in relation to the current landscape of OSAS management.All eligible cases were randomly assigned to the development (n = 510) and validation (n = 219) cohorts.Table S1 in the Online Supplementary Document contains a thorough list of the criteria for including and excluding participants in this research.Figure S1 in the Online Supplementary Document presents a flowchart of the study procedure.
The study was approved by the institutional review board of Xinjiang Medical University Affiliated Hospital of Traditional Chinese Medicine (number 2022XE0103-1).Because this was a retrospective study, the institutional review board waived the requirement for informed consent from patients.

Sleep monitoring
The Grael system from Compumedics (Melbourne, Australia) was utilised to conduct a comprehensive overnight laboratory-based polysomnograph for each eligible participant.The staff members responsible for analysing the polysomnography data were explicitly blinded to the patient's clinical status, ensuring they remained unaware of any clinical diagnoses, treatments, or outcomes associated with the patients from whom the polysomnography data were obtained.The sleep-related variables were derived from sleep monitoring, such as lowest arterial oxygen saturation, lowest heart rate during sleep, highest heart rate during sleep, and mean heart rate.A diagnosis of apnea was made when airflow dropped by 90% from the pre-event baseline for at least 10 seconds.
Hypopnea was identified as a reduction in airflow of at least 30% compared to the baseline for a minimum of 10 seconds, along with a decrease in oxygen saturation of at least 4% or a decrease in airflow of at least 50% lasting for at least 10 seconds as well as a decrease in oxygen saturation of at least 3%.To determine the frequency of apnea and hypopnea events per hour of sleep, the apnea-hypopnea index (AHI) was measured.The recorded data was manually scored by two experienced sleep technicians in accordance with the 2017 American Academy of Sleep Medicine Clinical Practice Guideline [17,18].Before sleep monitoring, the neck circumference, abdominal girth, weight, and height were measured for each patient.Body mass index (BMI) was calculated as weight in kilograms divided by height in square meters.Following discharge, patients received treatment in accordance with clinical practice guidelines [16], which encompassed lifestyle modifications such as weight reduction, abstinence from alcohol consumption, promotion of regular sleep patterns, and utilisation of CPAP as the primary therapeutic approach for moderate and severe cases.

Laboratory tests
The morning following admission, a fasting blood sample was collected from a peripheral vein and tested for biomarkers such as fasting glucose, haemoglobin A1c (HbA1c), lipids, fibrinogen, and D-dimer levels, along with kidney function biomarkers and standard blood cell counts.The fibrinogen level, white blood cell count, neutrophil-to-lymphocyte ratio, and platelet-to-lymphocyte ratio were identified as inflammation biomarkers.Furthermore, the triglyceride-glucose index (TyG) index was determined to assess insulin resistance by calculating natural logarithm (fasting triglyceride (mg/dl) × fasting plasma glucose (mg/dl)/2) [19].

Clinical endpoint
We conducted follow-up visits or telephone interviews to monitor patients for the occurrence of our primary clinical endpoint and major adverse cardiac and cerebrovascular events (MACCEs).The diagnosis of each event was determined according to well-established clinical criteria and evaluated by an independent expert committee that remained unaware of the participant's group assignment.We conducted thorough assessments at every visit, encompassing physical examinations, medical record reviews, and adverse event evaluations.To maintain methodological consistency, a standardised questionnaire was employed for data collection during both face-to-face visits and telephone interviews.All participant-reported events underwent rigorous verification through meticulous scrutiny of medical records followed by evaluation from an unbiased committee.The average duration of follow-up was 30.65 months for the study population.Various serious cardiac events, including cardiac mortality, acute coronary syndrome, and stroke, were considered MACCEs.

Constructing the predictive model
We have presented a comprehensive inventory of the multimodal data incorporated in our models, meticulously selected based on their clinical relevance and potential impact on MACCEs.The variables encompassed demographic characteristics such as age and sex, clinical factors including BMI, smoking status, alcohol consumption status, comorbidities, laboratory parameters, and sleep-related variables.We utilised the createDataPartition function from the caret package to perform the splitting of the machine learning data set into development and validation cohorts, with a designated train-test ratio of 7:3.To address the problem of overfitting, we incorporated a penalty parameter, also known as a tuning parameter, into the adaptive least absolute shrinkage and selection operator (Lasso) regression model.This parameter was utilised to penalise the coefficients of variables included in the model, thereby mitigating the risk of overfitting [20].Moreover, this approach enables the identification of the most relevant predictors from a potentially extensive pool of variables, which proves especially valuable in our study involving multiple clinical variables [20].In the development cohort, potential predictors were screened using Lasso regression with 10-fold cross-validation.
Subsequently, the prognostic model of independent factors for MACCEs was constructed based on factors identified using multivariate Cox's proportional hazards model, which allowed for calculating hazard ratios (HRs) with 95% confidence intervals (CIs).
The performance of the prognostic model was evaluated using established criteria [12,21].Using receiver operating characteristic curve analysis, we evaluated the predictive model's ability to differentiate by computing Harrell's concordance index (C-index) and its 95% CI.Calibration graphs were created to assess the accuracy of the prognostic model by comparing projected MACCE risk probabilities at two and three years with actual results.The net benefit of the prognostic model was evaluated at various threshold probabilities through decision curve analysis (DCA).
The prognostic and individual models were assessed for net benefit rates across six variables (sex, age, diabetes mellitus, history of coronary artery disease, TyG index, and AHI) in both the development and validation groups.We utilised the predictive algorithm to calculate a personalised risk assessment for every patient in both the development and validation groups, resulting in the identification of low-risk and highrisk categories based on the most effective threshold for each group.Furthermore, the overall MACCE-free survival rate was assessed through the Kaplan-Meier technique, categorising patients into low-risk and highrisk categories based on the outcomes.We utilised a log-rank test to ascertain the variance in MACCE risk levels between the low-risk and high-risk categories.

Statistical analysis
Statistical analysis was conducted using R, version 4.2.2(RCore Team, Vienna, Austria) and SPSS, version 23 (IBM, Armonk, New York, USA).As descriptive statistics, percentages and frequencies were reported for categorical variables, and median values with interquartile ranges (IQR) or mean values with standard deviations were reported for continuous variables.We assessed non-parametric continuous data with the Mann-Whitney U test, while distinctions in categorical variables were identified through the Fisher exact test or χ 2 test.Statistical significance was determined for all variances with a P<0.05.

Clinical features of the study population
The development and validation cohorts included 510 and 219 eligible patients, respectively.A total of 169 patients underwent CPAP treatment.The median duration from the initial OSAS diagnosis to the first clinical outcome evaluation was 32 months (IQR = 27.0-35.0)for the entire cohort.Comparable rates of MACCEs were observed in the development and validation groups.Additionally, no differences in sleep monitoring data, laboratory test results, and demographic characteristics were detected between the development and validation cohorts (Table 1).The utilisation of our data mitigates the potential for bias in evaluating the performance of the predictive model and enhances the generalisability of our findings to similar populations.

Prediction nomogram
In the Lasso regression, 43 clinical characteristics were included, and the optimal penalisation was determined using 10-fold cross-validation.Seven potential predictors of MACCEs were identified -sex, age, diabetes mellitus, previous coronary artery disease (CAD), TyG index, HbA1c, triglyceride, and AHI (Figure 1, Figure 2).The correlation coefficients that demonstrate the predictive power of each variable are presented in Figure 3.
We then constructed the prediction nomogram by selecting six variables (sex, age, diabetes mellitus, previous CAD, TyG index, and AHI).These were established as predictors significant enough to be included for risk stratification at two and three years (Figure 5).Using the nomogram, the two and three-year survival probabilities are calculated by drawing a perpendicular line from the corresponding axis of each predictor to the top line labelled 'points'.Then, the points are summed for all predictors, and a line is drawn from the axis labelled 'total points' to the place where it intercepts each of the survival axes.Accordingly, along this vertical line, the predicted risk corresponding to the 'total points' represents the patient's two and three-year survival probabilities.The corresponding equations for two and three-year survival probability are as follows: two-year survival probability = = 0 × total points 3 + 0.00013 × total points 2 + (-0.00199) × total points +0.39333; three-year survival probability = = 0 × total points 3 + (-0.00107) × total points 2 + 0.13293 × total points + (-4.25466).

DISCUSSION
In the present study, we used data from a large population of Chinese patients newly diagnosed with OSAS who were treated using standard treatment regimens to develop a prognostic model for cardiovascular disease risk based on six common clinical variables at baseline (sex, age, diabetes mellitus, previous CAD, TyG index, and AHI), encompassing multiple data modalities.With this model, we demonstrated that longitudinal prediction of prognosis at two and three years after initial diagnosis of OSAS substantially benefited from the integration of multimodal data compared with the use of unimodal data in our Chinese population.In summary, our research findings suggest that our forecasting chart can be a reliable and simple tool to help identify OSAS patients at a high risk of MACCEs in primary health care facilities in China, which can guide treatment choices to enhance outcomes for these individuals.Additionally, our results indicate Up to one billion middle-aged people have OSAS worldwide, and a marked increase in the number of OSAS patients has been observed with the expansion of the obesity epidemic [1][2][3][4].However, a significant proportion of patients with OSAS remain undiagnosed, and those with moderate or severe OSAS are at high risk for poor outcomes [1][2][3][4]22,23].With the prevalence of OSAS in our population, public awareness of the importance of controlling OSAS must be achieved.With low OSAS awareness, inadequate treatment and/or medication adherence contribute to poor prevention and control of OSAS among Chinese adults [1].Conversely, early detection, precise preventative measures, and risk stratification lead to better OSAS patient prognosis and decrease the risk of MACCEs [7,17].Evidence-based risk stratification can empower doctors to make appropriate decisions, give patients more control over their health, and give payers and policymakers confidence to invest in proper solutions [12,21].A previous study provided clinical evidence that longitudinal prediction of risk of cardiovascular events in OSAS patients benefits substantially from integrating multimodal data over-relying on unimodal data [15].The nomogram developed in that study included patient demographic and clinical characteristics with physiologic indices to provide greater accuracy for predicting cardiovascular outcomes.However, well-established biomarker data and information about CPAP adherence were unavailable [15].Moreover, the study lacked a validation group and did not assess the clinical applicability of the nomogram.Importantly, a global perspective on risk prediction models has rarely been presented in clinical guidelines for the development and application of risk assessment.Given the heavy burden inflicted by the high prevalence of OSAS in China currently, our study indicates that a risk nomogram that incorporates conventional clinical data, a biomarker (TyG index), and sleep monitoring data (OSAS severity based on AHI) offers a simple yet valid tool that doctors can utilise to easily assess the risk of MACCEs in patients with initial diagnosis OSAS.Including routine clinical variables in our model ensures their availability across diverse health care systems worldwide, thereby supporting the applicability of our model beyond the Chinese population.This universality underscores the feasibility of implementation, provided that basic clinical data collection practices are established within the respective health care settings.While the predictors are universally accessible, the prevalence and impact of certain predictors may exhibit variations across different populations due to genetic, environmental, and lifestyle factors.To adapt our model for utilisation in diverse populations, it would be imperative to conduct population-specific recalibration by adjusting the weight assigned to each predictor based on its relative importance or preva- lence within the new population.Furthermore, our prognostic model sets a benchmark for future research in OSAS and cardiovascular disease.It highlights the importance of considering various variables in risk prediction models, including those specific to OSAS.Future studies can build upon our approach, integrating additional biomarkers or clinical variables to refine risk prediction and patient management further.
Accumulating evidence and our data support the notion that OSAS and cardiovascular and cerebrovascular diseases have shared risk factors [7,24,25].Consistent with previous research [24,25], our study displayed that patients with OSAS who are older, have diabetes, and have a history of CAD are at a higher risk of developing MACCEs.Also, assessment of the severity of OSAS through the AHI in our model is consistent with previous research indicating that a higher severity of OSAS often corresponds to a poorer prognosis for patients [6,15,26].Including the AHI in the visual risk assessment tool probably boosted the tool's additional prognostic value and its capacity to improve clinical decision-making.Currently, insulin resistance (IR) plays a pivotal role in the diagnosis and prognosis of OSAS and adverse cardiovascular events, enabling triage decisions in the high-risk care of the OSAS patient [27,28].Pathophysiologically, the process and evolution of OSAS involve multiple pathophysiological mechanisms, among which IR is significant [29].Studies have shown an increased incidence of IR in OSAS patients and that the degree of IR is positively correlated with the severity of OSAS [27][28][29].IR affects not only glucose but also lipid metabolism and blood pressure regulation, exacerbating the pathophysiological process of OSAS and ultimately leading to a poorer prognosis [27][28][29][30][31][32].Accumulating evidence supports that the TyG index is a reliable marker of IR [19,33,34].Additionally, previous literature has shown that the TyG index can independently predict negative results in individuals with OSAS [10,28,35,36].Our data exhibited that the TyG index enhanced the predictive accuracy of our nomogram for MACCEs, which is important for patients with OSAS.Notably, our model emphasises the necessity of moving beyond solely focusing on the AHI and instead advocates for adopting a comprehensive, patient-centred risk assessment approach to manage OSAS effectively.Indeed, successful OSAS management necessitates thorough consideration of various factors, including patient symptoms, comorbidities, and biomarkers.Moreover, the accurate prediction of cardiovascular disease risk by our model enables digital health intervention providers to customise treatment strategies based on each patient's risk profile [36,37].For instance, patients identified as high-risk may receive more comprehensive management of OSAS and associated cardiovascular risk factors, potentially including earlier initiation of CPAP therapy, more aggressive control of blood glucose and dyslipidemia, and closer monitoring for the development of cardiovascular complications.Integrating our prognostic model into clinical practice can support clinicians in making informed decisions regarding prioritising resources and interventions.This is particularly relevant in settings where health care resources are limited, and there is a need to identify patients most likely to benefit from specific treatment modalities.To enhance the utility and accessibility of our predictive model, we propose developing a web-based application.This platform will empower health care professionals to input patient-specific information and automatically compute the risk probabilities associated with the condition of interest.Upon entering the patient's clinical variables into the application, the predictive model will promptly calculate and display real-time risk probability assessment.This instantaneous evaluation will assist clinicians in making well-informed decisions regarding patient care and management.
This study has several limitations.First, selection bias could not be avoided because this was a retrospective study.A prospective clinical study is needed to gather more substantial clinical evidence, and the results are then used to enhance the prediction model.Incorporating multicenter studies involving institutions from diverse geographical locations and health care systems would encompass a broader range of patient demographics.This is crucial for evaluating the predictive model's performance across different ethnicities, lifestyles, and genetic backgrounds.This approach enhances the generalisability and applicability of the study findings.Second, the study population was treated at a single centre, which may reduce the generalisability of our findings.Third, as external validation data were not obtained, our study included only internal validation.Fourth, assuming that all predictor levels will remain unchanged throughout the follow-up period is overly simplistic.The trajectories of these predictors, such as lifestyle modification or OSAS progression over time, were unknown for our patients and could not be evaluated at baseline when the risk prediction was initially made.Further, the potential impact of emerging technologies, such as machine learning algorithms, may enhance the accuracy and relevance of existing predictive models.Future research will explore applying advanced machine learning techniques, such as deep learning, ensemble methods, and natural language processing, to enhance the model's performance.These sophisticated methodologies are capable of effectively handling complex interactions and nonlinear relationships compared to traditional statistical models.Machine learning also provides tools for identifying and mitigating biases in predictive models.Fairness algorithms will be incorporated in future iterations of our model to ensure equitable predictions that do not inadvertently disadvantage any specific patient group.Finally, due to the retrospective nature of our study, we encountered certain limitations in acquiring comprehensive information regarding CPAP usage, patient adherence, and weight fluctuations.The heterogeneity in the types of respiratory devices employed by the patients presents additional obstacles in standardising the parameters.

CONCLUSIONS
This research presents a novel predictive model based on various data sources to provide Chinese patients and their health care providers with a numerical risk assessment for MACCEs when diagnosing OSAS patients receiving standard treatments.This model provides a more intuitive and robust scientific tool for the predictive, preventive, personalised, and participatory care of these patients.Future research should prioritise the development of more efficient methodologies to integrate data, establish optimal practices for evaluating model effectiveness across diverse data types, and conduct prospective studies with large sample sizes to validate the clinical utility of these models.

Figure 1 .Figure 2 .
Figure 1.Least absolute shrinkage and selection operator regression plot of the model coefficient trendlines for the 43 variables potentially associated with the risk of MAC-CEs in obstructive sleep apnea syndrome patients.MACCEs -major adverse cardio and cerebrovascular events

Figure 3 .
Figure 3. Strength of correlation between variable and MACCEs risk according to correlation coefficients based on varying values of λ one standard error.The y-axis denotes the different variables, while the x-axis represents the magnitude of the correlation coefficient.MACCEs -major adverse cardio and cerebrovascular events

Figure 4 .
Figure 4. Forest plot with hazard ratios, 95% confidence intervals, and corrected P-values for independent prognostic variables identified by multivariate Cox regression analysis.

Figure 5 .
Figure 5.A prognostic nomogram was constructed from the optimal multivariate Cox regression to predict two and three-year survival probabilities from the initial diagnosis of obstructive sleep apnea syndrome in the development cohort.

Figure 6 .
Figure 6.Receiver operating characteristic curve analysis of the discriminative ability of the prognostic nomogram.Panel A. Development cohort.Panel B. Validation cohort.

Figure 7 .
Figure 7. Calibration curves for the prognostic model.Panel A. Development cohort.Panel B. Validation cohort.

Figure 8 .
Figure 8. Decision curve analysis of the prognostic nomogram's ability to predict the risk of MACCEs.Panel A. At two years in the development cohort.Panel B. At three years in the development cohort.Panel C. At two years in the validation cohort.Panel D. At three years in the validation cohort.MACCEs -major adverse cardio and cerebrovascular events

Figure 9 .
Figure 9. Net benefit rate of the prediction nomogram and separate models for the six independent variables (sex, age, diabetes mellitus, previous coronary artery disease, triglyceride-glucose index, and apnea-hypopnea index) in the development and validation cohorts.Panel A. Prediction of two-year MACCEs risk for the development cohort.Panel B. Prediction of two-year MACCEs risk for the validation cohort.Panel C. Prediction of three-year MACCEs risk for the development cohort.Panel D. Prediction of three-year MACCEs risk for the validation cohort.MACCEs -major adverse cardio and cerebrovascular events

Figure 10 .
Figure 10.Cumulative MACCE-free survival stratified according to MACCE risk based on the median nomogram score.Panel A. Development cohort.Panel B. Validation cohort.MACEE -major adverse cardio and cerebrovascular events

Table 1 .
Baseline patient characteristics and MACCEs (clinical outcome) in the development and validation cohorts